Improving the Estimation of Word Importance for News Multi-Document Summarization

نویسندگان

  • Kai Hong
  • Ani Nenkova
چکیده

In this paper, we propose a supervised model for ranking word importance that incorporates a rich set of features. Our model is superior to prior approaches for identifying words used in human summaries. Moreover we show that an extractive summarizer which includes our estimation of word importance results in summaries comparable with the state-of-the-art by automatic evaluation. Disciplines Computer Engineering | Computer Sciences Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-14-02. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/989 Improving the Estimation of Word Importance for News Multi-Document Summarization Extended Technical Report Kai Hong University of Pennsylvania Philadelphia, PA, 19104 [email protected] Ani Nenkova University of Pennsylvania Philadelphia, PA, 19104 [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Estimation of Word Importance for News Multi-Document Summarization - Extended Technical Report

In this paper, we propose a supervised model for ranking word importance that incorporates a rich set of features. Our model is superior to prior approaches for identifying words used in human summaries. Moreover we show that an extractive summarizer which includes our estimation of word importance results in summaries comparable with the state-of-the-art by automatic evaluation. Disciplines Co...

متن کامل

EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS

Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...

متن کامل

A New Document Embedding Method for News Classification

Abstract- Text classification is one of the main tasks of natural language processing (NLP). In this task, documents are classified into pre-defined categories. There is lots of news spreading on the web. A text classifier can categorize news automatically and this facilitates and accelerates access to the news. The first step in text classification is to represent documents in a suitable way t...

متن کامل

Use of Multiple Features for Extracting Topics from News Clusters

In this paper we consider a method for extraction of sets of semantically similar language expressions representing different participants of the text story – thematic nodes. The method is based on the structural organization of news clusters and exploits comparison of various contexts of words. The word contexts are used as a basis for multiword expression extraction and thematic node construc...

متن کامل

Improving the Performance of the Random Walk Model for Answering Complex Questions

We consider the problem of answering complex questions that require inferencing and synthesizing information from multiple documents and can be seen as a kind of topicoriented, informative multi-document summarization. The stochastic, graph-based method for computing the relative importance of textual units (i.e. sentences) is very successful in generic summarization. In this method, a sentence...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014